Reinforcement Learning without an Explicit Terminal State

نویسنده

  • Martin Riedmiller
چکیده

| The article introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predeened target value is reached, but instead the controller has to continue to control the system in order to avoid the system's output drifting away from its target value again. We propose a set of assumptions and give a proof for the convergence of the value iteration method. From this a new algorithm, which we call the xed horizon algorithm, is derived. The performance of the proposed algorithm is compared to an approach, that assumes the existence of an explicit terminal state. The application to a cart/double pole-system nally shows the application to a diicult practical control task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving the Credit Assignment Problem: The Interaction of Explicit and Implicit Learning with Internal and External State Information

In most problem-solving activities, feedback is received at the end of an action sequence. This creates a credit-assignment problem where the learner must associate the feedback with earlier actions, and the interdependencies of actions require the learner to either remember past choices of actions (internal state information) or rely on external cues in the environment (external state informat...

متن کامل

Epoch-incremental reinforcement learning algorithms

In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal stat...

متن کامل

Reinforcement Learning of a Simple Visual Task

Raw visual data must be interpreted in some fashion to result in spatial perception. Some models for this interpretation hypothesize visual routines as underlying mechanisms for distilling the properties of a scene. Visual routines must determine these properties quickly without higher–level information. We present an approach for learning an example of a visual routine using reinforcement. Rat...

متن کامل

Braitenberg Soccer

Well-developed individual and collaborative skills, such as dribbling the ball, positioning, and passing are required for a team of robots to be successful against an opponent team in a robot soccer scenario. This paper proposes an approach to individual and collaborative skill learning, where the robots are modeled as Braitenberg vehicles, and the required skills are implemented as combination...

متن کامل

Reinforcement Learning Adaptive Control and Explicit Criterion Maximization

This paper reviews an existing algorithm for adaptive control based on explicit criterion maximization (ECM) and presents an extended version suited for reinforcement learning tasks. Furthermore, assumptions under which the algorithm convergences to a local maxima of a long term utility function are given. Such convergence theorems are very rare for reinforcement learning algorithms working wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998